Mapping spatial objects with tmap
Taste of Spatial Analysis
February 2018
Mapping spatial objects with tmap
Taste of Spatial Analysis
Open RStudio & a Script file
Set working directory
Load Libraries (rgeos new - install if needed)
Load our data from part 1
Open the slide deck to follow along: http://bit.ly/geospatial_workshop_part2
Let's load the R libraries we will use
library(sp) # spatial objects and methods library(rgdal) # read and write from file library(rgeos) # geometric operations library(tmap) # mapping spatial objects
## Warning: package 'rgdal' was built under R version 3.4.3
## rgdal: version: 1.2-16, (SVN revision 701) ## Geospatial Data Abstraction Library extensions to R successfully loaded ## Loaded GDAL runtime: GDAL 2.1.3, released 2017/20/01 ## Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.4/Resources/library/rgdal/gdal ## GDAL binary built with GEOS: FALSE ## Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493] ## Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.4/Resources/library/rgdal/proj ## Linking to sp version: 1.2-5
## Warning: package 'rgeos' was built under R version 3.4.2
## rgeos version: 0.3-26, (SVN revision 560) ## GEOS runtime version: 3.6.1-CAPI-1.10.1 r0 ## Linking to sp version: 1.2-5 ## Polygon checking: TRUE
## Warning: package 'tmap' was built under R version 3.4.3
In the last workshop we:
read.csvggplot and ggmapSpatialPointsDataFrames with the sp::coordinatesrgdal::readOGRCRSs with the function sp::proj4string and sp::CRSsp::spTransformsfboundary_lonlat as a shapefile using rgdal::writeOGRBelow are the files we will work with.
All of these are in the data folder.
Use setwd to set your working directory to the location of the tutorial files.
For example:
setwd("~/Documents/Dlab/workshops/2018/rgeo/r-geospatial-workshop/r-geospatial-workshop")
Read in from CSV file
sfhomes <- read.csv('data/sf_properties_25ksample.csv', stringsAsFactors = FALSE)
sfhomes15 <- subset(sfhomes, as.numeric(SalesYear) == 2015)
class(sfhomes15)
## [1] "data.frame"
Make it spatial
sfhomes15_sp <- sfhomes15 # why do we do this?
coordinates(sfhomes15_sp) <- c('lon','lat')
proj4string(sfhomes15_sp) <- CRS("+init=epsg:4326")
Read in the sfboundary.shp and sf_highways.shp shapefiles
sf_boundary <- readOGR(dsn="data", layer="sf_boundary")
## OGR data source with driver: ESRI Shapefile ## Source: "data", layer: "sf_boundary" ## with 1 features ## It has 3 fields
sf_highways <- readOGR(dsn="data", layer="sf_highways")
## OGR data source with driver: ESRI Shapefile ## Source: "data", layer: "sf_highways" ## with 246 features ## It has 5 fields
Read in the bart.csv file
Promote the data frame to a SpatialPointsDataFrame
bart<- read.csv("data/bart.csv", stringsAsFactors = FALSE)
coordinates(bart) <- c("X","Y")
Are they all the same? Describe.
proj4string(sf_highways)
## [1] "+proj=utm +zone=10 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
proj4string(sf_boundary)
## [1] "+proj=utm +zone=10 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
proj4string(bart)
## [1] NA
proj4string(sfhomes15_sp)
## [1] "+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
SpatialPointsDataFrame will not have a defined CRS if they were created from a data frame
We need to define the CRS of the 'bart' object
What do we need to know to do this
Try it for the bart layer
In GIS a spatial object is called a layer
One element in a layer is called a feature
A feature consists of the geometry and the attribute data that describe it
Where is the CRS definition coming from?
proj4string(bart) <- CRS("+init=epsg:4326")
How else could we reference the CRS?
In order to perform spatial analysis we need to first convert all data objects to a common CRS.
Which type? Projected or Geographic CRS?
If my goal is to create maps, I may convert all data to a geographic CRS.
If my goal is to do spatial analysis, I will convert to a projected CRS.
Geographic CRSs
4326 Geographic, WGS84 (default for lon/lat)
4269 Geographic, NAD83 (USA Fed agencies like Census)
Projected CRSs
5070 USA Contiguous Albers Equal Area Conic
3310 CA ALbers Equal Area
26910 UTM Zone 10, NAD83 (Northern Cal)
3857 Web Mercator (web maps)
Use spTransform to transform bart and sfhomes15_sp to UTM 10N, NAD83
sf_highways and sf_boundary already have this CRS
Recall, this transformation is called projecting or reprojecting
Try it….
Note the two methods for doing same thing.
sfhomes15_utm <- spTransform(sfhomes15_sp, CRS("+init=epsg:26910"))
bart_utm <- spTransform(bart, CRS(proj4string(sf_boundary)))
Do they match?
proj4string(bart_utm)
## [1] "+proj=utm +zone=10 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
proj4string(sfhomes15_utm)
## [1] "+init=epsg:26910 +proj=utm +zone=10 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
#sfhomes15_utm <- spTransform(sfhomes15_sp, CRS("+init=epsg:26910"))
sfhomes15_utm <- spTransform(sfhomes15_sp, CRS(proj4string(sf_boundary)))
# check
proj4string(sfhomes15_utm)
## [1] "+proj=utm +zone=10 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
proj4string(sf_boundary)
## [1] "+proj=utm +zone=10 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
plot(sf_boundary) lines(sf_highways, col='purple', lwd=4) points(sfhomes15_utm) plot(bart_utm, col="red", pch=15, add=T)
tmap stands for thematic map
It's a powerful toolkit for creating maps with sp objects
yet, with less code than the alternatives
Syntax should be familar to ggplot2 users, but simpler
Relatively easy to create & save interactive maps
Load the library
library(tmap)
You may need to install /load dependencies - ggplot2, RColorbrewer, classInt, leaflet libraries
tm_shape(sf_boundary) + tm_polygons(col="beige", border.col="black")
Use tm_shape(<sp_object>) to specifiy a sp object
Add + tm_<element>(...) to style the layer symbology
…and other options for creating a publication ready map
tm_shape(sf_boundary) + tm_polygons(col="beige", border.col="black")
?tmap
?tmap_shape
?tmap_element
?tm_symbols (tm_dots, etc…)
tm_shape(sfhomes15_utm) + tm_dots(col="red", size=.25)
tm_shape(sf_highways) + tm_lines(col="black")
tm_shape(sfhomes15_sp) + tm_dots(col="totvalue", size=.25) # column names must be quoted
How do you think we would map all sp objects together?
tm_shape(sf_boundary) + tm_polygons(col="beige", border.col="black") + tm_shape(sf_highways) + tm_lines(col="black") + tm_shape(sfhomes15_utm) + tm_dots(col="totvalue", size=.25)
Below we are mapping sf_boundary (UTM10) and sfhomes15_sp (WGS84)
tm_shape(sf_boundary) + tm_polygons(col="beige", border.col="black") + tm_shape(sfhomes15_sp) + tm_dots(col="totvalue", size=.25)
If the CRSs are defined, tmap will use that info
If not, tmap will assume WGS84
and dynamically reproject subsequent layers to match first one added to map
tm_shape(sf_boundary) + tm_polygons(col="beige", border.col="black") + tm_shape(sfhomes15_utm) + tm_dots(col="totvalue", size=.25, title = "San Francisco Property Values (2015)") + tm_layout(inner.margins=c(.05, .2, .15, .05)) # bottom, left, top, right
tmap_mode("view")
## tmap mode set to interactive viewing
last_map() # display the last map
last_map() # display the last map
tmap_mode("plot")
## tmap mode set to plotting
Questions?
Check out tmap in a Nutshell
Mapping / plotting to see location and distribution
Asking questions of, or querying, your data.
Cleaning & reshaping the data
Applying analysis methods
Mapping analysis results
Repeat as needed
There are two key types of spatial queries
These types are often combined, e.g.
What is the area of SF Boundary?
rgeos::gArea to computegArea(sf_boundary)
## [1] 119949901
Use the rgeos function gArea to compute area of polygon spatial objects.
gArea(sf_boundary) / (1000 * 1000)
## [1] 119.9499
Use the rgeos function gLength to compute length of linear spatial objects.
gLength(sf_highways) / 1000
## [1] 39.83624
The rgeos function gDistance will return the min distance between two geometries.
Distance in KM between Embarcadero & Powell St Bart stations
gDistance(bart_utm[bart_utm$STATION == 'EMBARCADERO',],
bart_utm[bart_utm$STATION == 'POWELL STREET',]) /1000
## [1] 1.334997
Get distance between all sfhomes and Embarcadero BART
dist2emb<- gDistance(bart_utm[bart_utm$STATION == 'EMBARCADERO',],
sfhomes15_utm, byid=TRUE) /1000
# check output
nrow(dist2emb)
## [1] 835
nrow(sfhomes15_utm)
## [1] 835
head(dist2emb)
## 30 ## 24 9.356050 ## 35 3.775303 ## 76 2.381774 ## 79 11.130542 ## 85 4.409883 ## 90 1.660183
How is the output different if byid=FALSE (default)
#dist2emb<- gDistance(bart_utm[bart_utm$STATION == 'EMBARCADERO',],
# sfhomes15_utm, byid=TRUE) /1000
gDistance(bart_utm[bart_utm$STATION == 'EMBARCADERO',],
sfhomes15_utm) /1000
## [1] 0.2974795
Spatial relationship queries compare the geometries of two spatial objects in the same coordinate space (CRS).
There are many, often similar, functions to perform these queries (can be confusing!).
These operations may return logical values, lists, matrices, dataframes, geometries or spatial objects
you need to check what type of object is returned
you need to check what values are returned to make sure they make sense
CRSs DO need to match
But …
they don't need to be Projected CRSs
They can any CRS
This is a very common type of spatial query called a point-in-polygon query.
We can use the rgeos function gIntersects to answer this.
We already know the answer but we want to see how it is done.
What is the output of gIntersects?
bart_stations_in_sf <-gIntersects(bart_utm, sf_boundary) # bart_stations_in_sf
What is the output of gIntersects?
bart_stations_in_sf <-gIntersects(bart_utm, sf_boundary) bart_stations_in_sf
## [1] TRUE
Same function, but byid=TRUE What is the output of gIntersects this time?
sfbart_stations <-gIntersects(bart_utm, sf_boundary, byid=TRUE) # class(sfbart_stations) # sfbart_stations
What is the output of gIntersects?
sfbart_stations <-gIntersects(bart_utm, sf_boundary, byid=TRUE) class(sfbart_stations)
## [1] "matrix"
sfbart_stations
## 1 2 3 4 5 6 7 8 9 10 11 12 ## 0 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## 13 14 15 16 17 18 19 20 21 22 23 24 ## 0 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## 25 26 27 28 29 30 31 32 33 34 35 36 37 ## 0 FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE ## 38 39 40 41 42 43 44 ## 0 FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Why is it different?
We can use the result of gIntersects to view the names of the bart stations
bart_utm[as.vector(sfbart_stations),]$STATION
## [1] "EMBARCADERO" "MONTGOMERY STREET" ## [3] "POWELL STREET" "CIVIC CENTER/ UN PLAZA" ## [5] "16TH STREET & MISSION" "24TH STREET & MISSION" ## [7] "GLEN PARK" "BALBOA PARK"
We can use the result of gIntersects to subset bart_utm
sfbart_utm <- bart_utm[as.vector(sfbart_stations),]
tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(sf_boundary) + tm_polygons(col="beige", border.col="black") + tm_shape(sfbart_utm) + tm_dots(col="red")
tmap to plot modetmap_mode("plot")
## tmap mode set to plotting
rgeos::gIntersects(geom1, geom2) returns
byid=FALSE (the default when not specified)byid=TRUELet's consider the sfhomes15_utm data along with SF Census tract data
Use readOGR to load the shapefile sftracts_wpop.shp
data folderExplore the data
sp object is it?sftracts <- readOGR(dsn="./data", layer="sftracts_wpop")
## OGR data source with driver: ESRI Shapefile ## Source: "./data", layer: "sftracts_wpop" ## with 195 features ## It has 10 fields
class(sftracts)
## [1] "SpatialPolygonsDataFrame" ## attr(,"package") ## [1] "sp"
proj4string(sftracts)
## [1] "+proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0"
head(sftracts@data)
## STATEFP COUNTYFP TRACTCE AFFGEOID GEOID NAME LSAD ## 0 06 075 010700 1400000US06075010700 06075010700 107 CT ## 1 06 075 012201 1400000US06075012201 06075012201 122.01 CT ## 2 06 075 013102 1400000US06075013102 06075013102 131.02 CT ## 3 06 075 013500 1400000US06075013500 06075013500 135 CT ## 4 06 075 015500 1400000US06075015500 06075015500 155 CT ## 5 06 075 016300 1400000US06075016300 06075016300 163 CT ## ALAND AWATER pop14 ## 0 183170 0 5311 ## 1 92048 0 4576 ## 2 237886 0 2692 ## 3 235184 0 2592 ## 4 311339 0 3918 ## 5 245867 0 4748
plot(sftracts)
A spatial join associates rows of data in one object with rows in another object based on the spatial relationship between the two objects.
A spatial join is based on the comparison of two sets of geometries in the same coordinate space.
The default spatial relationship is intersects
We need to spatially join the sftracts and sfhomes15_utm to answer this.
What spatial object are we joining data from? to?
Spatial overlay operations in R are implemented using over
Simple point-in-polygon operations use sp::over
More complex geometric comparisons use rgeos::over
You need to have rgeos package installed!
over(x,y)You can interperet over(x,y) as:
You can interperet over(x,y, returnList=TRUE) as:
See ?over for details.
In what census tract is each SF property located?
homes_with_tracts <- over(sfhomes15_utm, sftracts)
If not, why not?
The over function, like almost all spatial analysis functions, requires that both data sets be spatial objects (they are) with the same coordinate reference system (CRS). Let's investigate
# What is the CRS of the property data? proj4string(sfhomes15_utm) # What is the CRS of the census tracts? proj4string(sftracts)
# Transform the CRS for tracts to be the same as that for sfhomes15_sp sftracts_utm <- spTransform(sftracts, CRS(proj4string(sfhomes15_utm))) # make sure the CRSs are the same proj4string(sftracts_utm) == proj4string(sfhomes15_utm)
## [1] TRUE
Now let's try that overlay operation again
In what tract is each SF property is located?
homes_with_tracts <- over(sfhomes15_utm, sftracts_utm)
over outputWhat is our output? Does it answer our question?
What type of data object did the over function return?
Do we need to add returnList = TRUE
homes_with_tracts <- over(sfhomes15_utm, sftracts_utm) class(homes_with_tracts) nrow(homes_with_tracts) nrow(sftracts_utm) nrow(sfhomes15_utm)
homes_with_tracts <- over(sfhomes15_utm, sftracts_utm) class(homes_with_tracts)
## [1] "data.frame"
nrow(homes_with_tracts)
## [1] 835
nrow(sftracts_utm)
## [1] 195
nrow(sfhomes15_utm)
## [1] 835
head(homes_with_tracts)
## STATEFP COUNTYFP TRACTCE AFFGEOID GEOID NAME LSAD ## 24 06 075 030800 1400000US06075030800 06075030800 308 CT ## 35 06 075 061400 1400000US06075061400 06075061400 614 CT ## 76 06 075 060700 1400000US06075060700 06075060700 607 CT ## 79 06 075 033203 1400000US06075033203 06075033203 332.03 CT ## 85 06 075 020800 1400000US06075020800 06075020800 208 CT ## 90 06 075 011200 1400000US06075011200 06075011200 112 CT ## ALAND AWATER pop14 ## 24 1243203 0 5646 ## 35 797747 0 5625 ## 76 2082513 844390 9804 ## 79 437672 0 3931 ## 85 346098 0 6182 ## 90 177415 0 3078
over discussionOur output homes_with_tracts is a data frame that contains
sfhomes15_utm - in same ordersfhomes15_utmsftracts_utm@data including the census tract id (GEOID)So we are close to answering our question.
But for the data to be useful we need - to add the GEOID column in homes_with_tracts to sfhomes15_utm
CAUTION: this only works because the data are in the same order!
sfhomes15_utm$home_geoid <- homes_with_tracts$GEOID # Review and note the change #head(sfhomes15_utm@data)
Data linkage via space!
The over operation gave us the census tract data info for each point in sfhomes15_utm
We added the GEOID for each point to the @data slot of sfhomes15_utm
We can now join sfhomes15_utm points by GEOID to any census variable, eg median household income, and then do an analysis of the relationship between, for example, property value and that variable.
How would we do that?
Because GEOIDs can have leading zeros, we set the colClasses to make sure they are not stripped.
med_hh_inc <- read.csv("data/sf_med_hh_income2015.csv", stringsAsFactors = F,
colClasses = c("character","numeric"))
head(med_hh_inc)
## GEOID medhhinc ## 1 06075980401 0 ## 2 06075990100 0 ## 3 06075012502 11925 ## 4 06075012301 13909 ## 5 06075061100 16545 ## 6 06075980501 16638
We can use sp::merge to join the med_hh_inc DF to the sfhomes15_utm SPDF.
We should make sure that they share a column of common values - GEOID / home_geoid
We use sp:merge not regular merge to maintain the integrity of the sp object
When we join two data objects based on common values in a column.
sp:merge makes this syntax simple for S*DF objects
sfhomes15_utm <- merge(sfhomes15_utm,
med_hh_inc, by.x="home_geoid", by.y="GEOID")
head(sfhomes15_utm@data, 2) # take a look
## home_geoid FiscalYear SalesDate Address ## 549 06075030800 2015 2015-08-21 0000 2760 19TH AV0015 ## 758 06075061400 2015 2015-08-13 0000 0560AMISSOURI ST0000 ## YearBuilt NumBedrooms NumBathrooms NumRooms NumStories NumUnits ## 549 1979 2 2 5 0 1 ## 758 2003 2 2 5 1 1 ## AreaSquareFeet ImprovementValue LandValue Neighborhood ## 549 1595 432500 432500 West of Twin Peaks ## 758 1191 701280 701280 Potrero Hill ## Location SupeDistrict totvalue SalesYear ## 549 (37.7360097396496, -122.474067310226) 7 865000 2015 ## 758 (37.759197817252, -122.396516184449) 10 1402560 2015 ## medhhinc ## 549 131699 ## 758 138897
IMPORTANT: DO NOT merge a data frame to the @data slot!
Join it to the SPDF!
library(ggplot2) ggplot(sfhomes15_utm@data, aes(x=totvalue, y=medhhinc)) + geom_point() + geom_smooth(method=lm)
We now know the tract for each property.
Now let's think about this question from the tract perspective.
Let's ask the question
Since we joined GEOID to each property we can use the non-spatial aggregate function to compute the mean of totvalues for each GEOID.
But let's use the sp aggregate function instead!
It's actually more straight forward.
tracts_with_mean_val <- aggregate(x = sfhomes15_utm["totvalue"],
by = sftracts_utm, FUN = mean)
Wow, so simple. What does that give us?
What type of spatial object is returned by sp::aggregate
What values are in the @data slot?
class(tracts_with_mean_val)
## [1] "SpatialPolygonsDataFrame" ## attr(,"package") ## [1] "sp"
head(tracts_with_mean_val@data)
## totvalue ## 0 NA ## 1 482000.0 ## 2 702000.0 ## 3 2825000.0 ## 4 800983.3 ## 5 1946666.7
nrow(tracts_with_mean_val) == nrow(sftracts_utm)
## [1] TRUE
sp::aggregate returned a SpatialPolygonsDataFrame
The SPDF has the same geometry as sftracts_utm
But the tracts_with_mean_val@data slot only contains the mean totvalue for each tract.
To make these data more useful, let's add this value to sftracts_utm
This only works because their are the same number of elements both @data slots and they are in the same order!
sftracts_utm$mean_totvalue <- tracts_with_mean_val$totvalue head(sftracts_utm@data) # check it
## STATEFP COUNTYFP TRACTCE AFFGEOID GEOID NAME LSAD ## 0 06 075 010700 1400000US06075010700 06075010700 107 CT ## 1 06 075 012201 1400000US06075012201 06075012201 122.01 CT ## 2 06 075 013102 1400000US06075013102 06075013102 131.02 CT ## 3 06 075 013500 1400000US06075013500 06075013500 135 CT ## 4 06 075 015500 1400000US06075015500 06075015500 155 CT ## 5 06 075 016300 1400000US06075016300 06075016300 163 CT ## ALAND AWATER pop14 mean_totvalue ## 0 183170 0 5311 NA ## 1 92048 0 4576 482000.0 ## 2 237886 0 2692 702000.0 ## 3 235184 0 2592 2825000.0 ## 4 311339 0 3918 800983.3 ## 5 245867 0 4748 1946666.7
tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(sftracts_utm) + tm_polygons(col="mean_totvalue", border.col=NA)
tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(sftracts_utm) + tm_polygons(col="mean_totvalue", border.col=NA) + tm_shape(sfhomes15_utm) + tm_dots()
Many methods of spatial analysis are based on distance queries.
For example, point pattern analysis considers the distance between features to determine whether or not they are clustered.
We can also use distance as a way to select features spatially.
The other side of distance!
What properties are within walking distance of BART?
In order to select properties with 1KM of BART - create a 1KM radius buffer polygon around each BART point
We then do a point-in-polygon operation to either count the number of properties within the buffer or compute the mean totvalue.
rgeos is the muscle for
We can use the rgeos::gBuffer function to create our buffer polygon
The rgeos::gBuffer function takes as input a spatial object or objects to buffer and a buffer distance.
Let's assume 1KM is standard walking distance.
bart_1km_buffer <- gBuffer(sfbart_utm, width=1000)
tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(bart_1km_buffer) + tm_polygons(col="red") + tm_shape(sfbart_utm) + tm_dots()
How does this differ when we add byid=TRUE
bart_1km_buffer_byid <- gBuffer(sfbart_utm, width=1000, byid=TRUE)
tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(bart_1km_buffer_byid) + tm_polygons(col="red") + tm_shape(sfbart_utm) + tm_dots()
What operation could we use?
Which buffer polygons?
Why byid=TRUE
What is the output?
homes_near_bart <- gIntersects(bart_1km_buffer, sfhomes15_utm, byid=TRUE) class(homes_near_bart)
## [1] "matrix"
head(homes_near_bart)
## buffer ## 24 FALSE ## 35 FALSE ## 76 FALSE ## 79 FALSE ## 85 TRUE ## 90 FALSE
# subset sfhomes15_utm_near_bart <- sfhomes15_utm[as.vector(homes_near_bart),]
tm_shape(bart_1km_buffer) + tm_polygons(col="red") + tm_shape(sfhomes15_utm_near_bart) + tm_dots()
That was a whirlwind tour of just some of the methods of spatial analysis.
There was a lot we didn't and can't cover.
sf package is emerging and will eclipse sp
Raster data is a another major topic! - but the raster package is the key
library(knitr)
purl("r-geospatial-workshop-feb2018-pt2.Rmd", output = "scripts/r-geospatial-workshop-feb2018-pt2.r", documentation = 2)